final layer
where โ = 1,2,,L is the number of hidden layers (ฯ(1)(ri) = ฯ(ri) and L is the final layer), ReLU is the nonlinear activation function, W (โ) E RN N is the weight matrix in layer โ,and b
These molecular properties were calculated using a hybrid quantum simulation (Gaussian 09) at the B3LYP/6-31G(2df,p) level of theory. In this study, we created a subset of the QM9 dataset with a limited number of atoms, M 14, per molecule, which we refer to as the "QM9under14atoms" dataset in the main text. As the learning/predicting targets, we selected three kinds of energy properties: atomization energy at 0 K, zero point vibrational energy, and enthalpy at 298.15 K. E RN is the bias vector in layer โ. The LCAO considers the normalization for the coefficients in Eq. (6) in the main text. Additionally, the normalization term in Eq. (7) in the main text is calculated as follows: Z(qn,ฮถn)=
Layer Probing Improves Kinase Functional Prediction with Protein Language Models
Kumar, Ajit, Jha, IndraPrakash
Protein language models (PLMs) have transformed sequence-based protein analysis, yet most applications rely only on final-layer embeddings, which may overlook biologically meaningful information encoded in earlier layers. We systematically evaluate all 33 layers of ESM-2 for kinase functional prediction using both unsupervised clustering and supervised classification. We show that mid-to-late transformer layers (layers 20-33) outperform the final layer by 32 percent in unsupervised Adjusted Rand Index and improve homology-aware supervised accuracy to 75.7 percent. Domain-level extraction, calibrated probability estimates, and a reproducible benchmarking pipeline further strengthen reliability. Our results demonstrate that transformer depth contains functionally distinct biological signals and that principled layer selection significantly improves kinase function prediction.
Iterative Inference in a Chess-Playing Neural Network
Sandmann, Elias, Lapuschkin, Sebastian, Samek, Wojciech
Do neural networks build their representations through smooth, gradual refinement, or via more complex computational processes? We investigate this by extending the logit lens to analyze the policy network of Leela Chess Zero, a superhuman chess engine. Although playing strength and puzzle-solving ability improve consistently across layers, capability progression occurs in distinct computational phases with move preferences undergoing continuous reevaluation--move rankings remain poorly correlated with final outputs until late, and correct puzzle solutions found in middle layers are sometimes overridden. This late-layer reversal is accompanied by concept preference analyses showing final layers prioritize safety over aggression, suggesting a mechanism by which heuristic priors can override tactical solutions.